Local Sequence Alignment Against a Database Problem
نویسنده
چکیده
Local Sequence Algignment. The local sequence alignment problem is defined as follows: Given two strings S = s 1. .. s n and T = t 1. .. t m , a substitution matrix Score and an insertion/deletion penalty δ, find a pair of substrings s i. .. s i+k of S and t j. .. t j+l of T that have the best overall alignment score, and return the best alignment for them. Local Sequence Algignment against a database. The local sequence alignment against a database problem extends the local sequence alignment problem by introducing multiple strings against which a single query string is a aligned: Given a query string S = s 1. .. s n , and a collection of strings D = {D 1 ,. .. D M } (usually referred to as a sequence database), a substitution matrix Score and an insertion/deletion penalty δ, find the best local alignments of S with any/all strings from D. Note. Smith-Waterman algorithm for local sequence alignment has runtime complexity O(nm). When extended to a database D of M strings of average length m, the runtime complexity of applying Smith-Waterman to each pair of strings S, D j becomes O(nmM). This is unacceptable for practical applications. Idea. Smith-Waterman algorithm guarantees the correct solution-i.e., always finds the best local alignment between a pair of strings. We can sacrifice the accuracy in lieu of efficiency by requiring our algorithm to find pretty good local alignments against a database fast. Approximation algorithms. We consider two approximation algorithms: FASTA and BLAST. Both algorithms are faster than Smith-Waterman. Both do not guarantee that they return the best possible alignments. However, for both 1
منابع مشابه
gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملParallelizing the Smith-Waterman Local Alignment Algorithm using CUDA
Given two strings S1 = pqaxabcstrqrtp and S2 = xyaxbacsl, the substrings axabcs in S1 and axbacs in S2 are very similar. The problem of finding similar substrings is the local alignment problem. Local alignment is extensively used in computational biology to find regions of similarity in different biological sequences. Similar genetic sequences are identified by computing the local alignment of...
متن کاملEfficient Querying on Genomic Databases by Using Metric Space Indexing Techniques
A genomic database consists of a set of nucleotide sequences, for which an important kind of queries is the local sequence alignment. This paper investigates two different indexing techniques, namely the variations of GNAT trees [1] and M-trees [3], to support fast query evaluation for local alignment, by transforming the alignment problem to a variant metric space neighborhood search problem.
متن کاملPairwise sequence alignment using bio-database compression by improved fine tuned enhanced suffix array
Sequence alignment is a bioinformatics application that determines the degree of similarity between nucleotide sequences which is assumed to have same ancestral relationships. This sequence alignment method reads query sequence from the user and makes an alignment against large and genomic sequence data sets and locate targets that are similar to an input query sequence. Existing accurate algor...
متن کاملEfficient Querying on Gnomic Databases by Using Metric Space Indexing Techniques
A genomic database consists of a set of nucleotide sequences, for which an important kind of queries is the local sequence alignment. This paper investigates two different indexing techniques, namely the variations of GNAT trees [1] and M-trees [3], to support fast query evaluation for local alignment, by transforming the alignment problem to a variant metric space neighborhood search problem.
متن کامل